DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...9
Hits 1 – 20 of 163

1
Probing for the Usage of Grammatical Number ...
BASE
Show details
2
Estimating the Entropy of Linguistic Distributions ...
BASE
Show details
3
A Latent-Variable Model for Intrinsic Probing ...
BASE
Show details
4
On Homophony and Rényi Entropy ...
BASE
Show details
5
On Homophony and Rényi Entropy ...
BASE
Show details
6
On Homophony and Rényi Entropy ...
BASE
Show details
7
Towards Zero-shot Language Modeling ...
BASE
Show details
8
Differentiable Generative Phonology ...
BASE
Show details
9
Finding Concept-specific Biases in Form--Meaning Associations ...
BASE
Show details
10
Searching for Search Errors in Neural Morphological Inflection ...
BASE
Show details
11
Applying the Transformer to Character-level Transduction ...
Wu, Shijie; Cotterell, Ryan; Hulden, Mans. - : ETH Zurich, 2021
BASE
Show details
12
Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models ...
BASE
Show details
13
Probing as Quantifying Inductive Bias ...
BASE
Show details
14
Revisiting the Uniform Information Density Hypothesis ...
BASE
Show details
15
Revisiting the Uniform Information Density Hypothesis ...
BASE
Show details
16
Conditional Poisson Stochastic Beams ...
BASE
Show details
17
Examining the Inductive Bias of Neural Language Models with Artificial Languages ...
BASE
Show details
18
Modeling the Unigram Distribution ...
BASE
Show details
19
Language Model Evaluation Beyond Perplexity ...
BASE
Show details
20
Differentiable Subset Pruning of Transformer Heads ...
Abstract: Multi-head attention, a collection of several attention mechanisms that independently attend to different parts of the input, is the key ingredient in the Transformer. Recent work has shown, however, that a large proportion of the heads in a Transformer's multi-head attention mechanism can be safely pruned away without significantly harming the performance of the model; such pruning leads to models that are noticeably smaller and faster in practice. Our work introduces a new head pruning technique that we term differentiable subset pruning. Intuitively, our method learns per-head importance variables and then enforces a user-specified hard constraint on the number of unpruned heads. The importance variables are learned via stochastic gradient descent. We conduct experiments on natural language inference and machine translation; we show that differentiable subset pruning performs comparably or better than previous works while offering precise control of the sparsity level. ...
Keyword: Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
URL: https://underline.io/lecture/38190-differentiable-subset-pruning-of-transformer-heads
https://dx.doi.org/10.48448/bk2x-zy23
BASE
Hide details

Page: 1 2 3 4 5...9

Catalogues
0
0
0
0
0
0
1
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
162
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern